Goto

Collaborating Authors

 approximate best response



Computational Aspects of Bayesian Persuasion under Approximate Best Response

Neural Information Processing Systems

We study Bayesian persuasion under approximate best response, where the receiver may choose any action that is not too much suboptimal, given their posterior belief upon receiving the signal. We focus on the computational aspects of the problem, aiming to design algorithms that efficiently compute (almost) optimal strategies for the sender. Despite the absence of the revelation principle --- which has been one of the most powerful tools in Bayesian persuasion --- we design polynomial-time exact algorithms for the problem when either the state space or the action space is small, as well as a quasi-polynomial-time approximation scheme (QPTAS) for the general problem. On the negative side, we show there is no polynomial-time exact algorithm for the general problem unless \mathsf{P} \mathsf{NP} . Our results build on several new algorithmic ideas, which might be useful in other principal-agent problems where robustness is desired.



Generalized Principal-Agent Problem with a Learning Agent

arXiv.org Artificial Intelligence

Generalized principal-agent problems, including Stackelberg games, contract design, and Bayesian persuasion, are a class of economic problems where an agent best responds to a principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. Using this reduction, we show that: (1) if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is at least the principal's optimal utility in the classic non-learning model minus the square root of the agent's regret; (2) if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility more than the optimal utility in the non-learning model plus the agent's swap regret. But (3) if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These general results not only refine previous results in Stackelberg games and contract design with learning agents but also lead to new results for Bayesian persuasion with a learning agent.


A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning

#artificialintelligence

Today we will dig into a paper ripped of A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning, one of the core ideas that has been used for the development of #AlphaStar . There are several concepts in AlphaStar that won t be treated here . The aim is to dig in the concepts that what has been as the "Nash League" conceptual functioning and how game theory came to mix with reinforcement learning . At the end of this article you should have a notion of Double Oracle algorithm, Deep Cognitive Hierarchies and Policy-Space Response Oracles . For this post you should be familiarized with some concepts about game theory, like the setup of the strategic game in form of the payoff matrix, the understanding of Nash Equilibria and best response.


Best Response Regression

Neural Information Processing Systems

In a regression task, a predictor is given a set of instances, along with a real value for each point. Subsequently, she has to identify the value of a new instance as accurately as possible. In this work, we initiate the study of strategic predictions in machine learning. We consider a regression task tackled by two players, where the payoff of each player is the proportion of the points she predicts more accurately than the other player. We first revise the probably approximately correct learning framework to deal with the case of a duel between two predictors. We then devise an algorithm which finds a linear regression predictor that is a best response to any (not necessarily linear) regression algorithm. We show that it has linearithmic sample complexity, and polynomial time complexity when the dimension of the instances domain is fixed. We also test our approach in a high-dimensional setting, and show it significantly defeats classical regression algorithms in the prediction duel. Together, our work introduces a novel machine learning task that lends itself well to current competitive online settings, provides its theoretical foundations, and illustrates its applicability.